Diff Expression Dataset 1

Preprocessing

Calculate normalization factors

## An object of class "DGEList"
## $counts
##                    R  R  R  R  R  R E  E  E  E E  E
## ENSECAG00000002020 0  0  0  0  0  0 0  0  0  0 0  0
## ENSECAG00000006502 0  0  0  0  0  0 0  0  0  0 0  0
## ENSECAG00000023919 3 14 18 14 24 24 4 29 16 28 8 25
## ENSECAG00000002066 0  0  0  0  0  0 0  0  0  0 0  0
## ENSECAG00000006515 0  0  0  0  0  0 0  0  0  0 0  0
## 26949 more rows ...
## 
## $samples
##   group lib.size norm.factors
## 1     1  2130248    0.6357591
## 2     1 16246106    1.0234993
## 3     1 13764962    0.9870797
## 4     1 19432713    1.0187328
## 5     1 14973545    1.0570107
## 7 more rows ...

Voom transformation

Model Matrix

Specify the model to be fitted.

##    groupE groupR
## 1       0      1
## 2       0      1
## 3       0      1
## 4       0      1
## 5       0      1
## 6       0      1
## 7       1      0
## 8       1      0
## 9       1      0
## 10      1      0
## 11      1      0
## 12      1      0
## attr(,"assign")
## [1] 1 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"

Fitting linear models

limma lmFit fits a linear model using weighted least squares for each gene:

Comparison between before exercise and after exercise

##         Contrasts
## Levels   groupE - groupR
##   groupE               1
##   groupR              -1

Estimate contrast for each gene

Empirical Bayes smoothing of standard errors

What genes are most differentially expressed?

Top Genes

Results

PS: if you click on the gene, it will lead you it’s ensembl page.

Diff Expression Dataset 2

Preprocessing

Calculate normalization factors

## An object of class "DGEList"
## $counts
##                    R R   R  R  R  R E  E  E   E   E  E
## ENSECAG00000002020 0 0   0  0  0  0 0  0  0   0   0  0
## ENSECAG00000006502 0 0   0  0  0  0 0  0  0   0   0  0
## ENSECAG00000023919 3 6 167 47 60 67 4 54 23 111 125 41
## ENSECAG00000002066 2 0   0  2  0  6 1  4  2   2   2 12
## ENSECAG00000006515 0 0   0  0  0  0 0  0  0   1   0  0
## 26949 more rows ...
## 
## $samples
##   group lib.size norm.factors
## 1     1  1214190     1.048777
## 2     1  1136857     1.003369
## 3     1 23500486     1.054662
## 4     1 23524764     1.028951
## 5     1 24598794     0.939532
## 7 more rows ...

Voom transformation

Model Matrix

Specify the model to be fitted.

##    groupE groupR
## 1       0      1
## 2       0      1
## 3       0      1
## 4       0      1
## 5       0      1
## 6       0      1
## 7       1      0
## 8       1      0
## 9       1      0
## 10      1      0
## 11      1      0
## 12      1      0
## attr(,"assign")
## [1] 1 1
## attr(,"contrasts")
## attr(,"contrasts")$group
## [1] "contr.treatment"

Fitting linear models

limma lmFit fits a linear model using weighted least squares for each gene:

Comparison between before exercise and after exercise

##         Contrasts
## Levels   groupE - groupR
##   groupE               1
##   groupR              -1

Estimate contrast for each gene

Empirical Bayes smoothing of standard errors

What genes are most differentially expressed?

Top Genes

Results

PS: if you click on the gene, it will lead you it’s ensembl page.

SNP identification

VCF Stats

Functions

novel_count <- function(horsex){
novel=length(which(horsex[,1]=="."))
return(novel)
}

known_count <-function(horsex){
   known=length(which(horsex[,1]!="."))
   return(known)
}

conseq<- function(horsexx){
  total = nrow(horsexx)
  others=100
 intron_variantt = ((length(which(horsexx[,4]=="intron_variant"))))
 intron_variant=paste0(floor(intron_variantt*100), "%")
 
  downstream_gene_variantt=((length(which(horsexx[,4]=="downstream_gene_variant"))))
  downstream_gene_variant=paste0(floor(downstream_gene_variantt*100), "%")
  
   synonymous_variantt=((length(which(horsexx[,4]=="synonymous_variant"))))
  synonymous_variant=paste0(floor(synonymous_variantt*100), "%")
  
   upstream_gene_variant=((length(which(horsexx[,4]=="upstream_gene_variant"))))
  upstream_gene_varian=paste0(floor(upstream_gene_variant*100), "%")
  
     missense_variantt=((length(which(horsexx[,4]=="missense_variant"))))
  missense_variant=paste0(floor(missense_variantt*100), "%")
  
    three_prime_UTR_variantt=((length(which(horsexx[,4]=="3_prime_UTR_variant"))))
  three_prime_UTR_variant=paste0(floor(three_prime_UTR_variantt*100), "%")
  
  intergenic_variantt=((length(which(horsexx[,4]=="intergenic_variant"))))
  intergenic_variant=paste0(floor(intergenic_variantt*100), "%")
  
  splice_region_variantt=((length(which(horsexx[,4]=="splice_region_variant"))))
  splice_region_variant=paste0(floor(splice_region_variantt*100), "%")
  
    non_coding_transcript_variantt=((length(which(horsexx[,4]=="non_coding_transcript_variant"))))
  non_coding_transcript_variant=paste0(floor(non_coding_transcript_variantt*100), "%")
  
   otherss=others-intron_variantt-non_coding_transcript_variantt-splice_region_variantt-intergenic_variantt-three_prime_UTR_variantt-missense_variantt-upstream_gene_variant-synonymous_variantt-downstream_gene_variantt
   others=paste0(floor(others*100), "%")
  
   return(c(intron_variantt,downstream_gene_variantt, upstream_gene_variant,synonymous_variantt,missense_variantt, three_prime_UTR_variantt,intergenic_variantt,splice_region_variantt,non_coding_transcript_variantt, otherss))
}

Known vs Novel

### Consequence stats Each interactive pie graph represents a horse. By putting the cursor on the pie you can identify which horse it represents.

Results

We will only visualize 1 gene for simplicity. However, in practice we did generate plots for the 64 genes.

LogFC

Gene’s LogFC for each horse.